Parallel solution of the subset-sum problem: an empirical study

نویسنده

  • Saniyah S. Bokhari
چکیده

We investigate the parallelization of an algorithm on three very different architectures. These are: a 128-processor Cray XMT massively multithreaded machine, a 16-processor IBM x3755 shared memory machine and a 240-core NVIDIA FX 5800 graphics processor unit (GPU). The problem we use in our investigation is the wellknown subset-sum problem. While this is known to be NP-complete, it is solvable in pseudo-polynomial time, i.e., time proportional to the number of input objects multiplied by the sum of their sizes. This product defines the size of the dynamic programming table used to solve the problem. The hypothesis that we wish to test is that the Cray, with its specialized hardware and large uniform shared memory, is suitable for very large problems, the IBM x3755 is suitable for intermediate sized problems and the NVIDIA FX 5800 can give superior performance only for problems that fit within its modest internal memory. We show that it is straightforward to parallelize this algorithm on the Cray XMT primarily because of the word-level locking that is available on this architecture. For the other two machines we present an alternating word algorithm that can implement an efficient solution. The timings of our respective codes were carefully measured over a comprehensive range of problem sizes. On the Cray XMT we observe very good scaling for large problems and see sustained performance as the problem size increases. However this machine has poor scaling for small problem sizes; it performs best for ii problem sizes of 10 bits or more. The IBM x3755 performs very well on medium sized problems, but has poor scalability as the number of processors increases and is unable to sustain performance as the problem size increases. This machine tends to saturate for problem sizes of 10 bits. The NVIDIA GPU performs well for problems whose tables fit within its 4GB device memory. This corresponds to tables of size approximately 10. The experimental measurements support our hypothesis and show that each machine gives superior performance for distinct ranges of problem sizes and demonstrate the strengths and weaknesses of the three architectures.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Finite Horizon Economic Lot and Delivery Scheduling Problem: Flexible Flow Lines with Unrelated Parallel Machines and Sequence Dependent Setups

This paper considers the economic lot and delivery scheduling problem in a two-echelon supply chains, where a single supplier produces multiple components on a flexible flow line (FFL) and delivers them directly to an assembly facility (AF). The objective is to determine a cyclic schedule that minimizes the sum of transportation, setup and inventory holding costs per unit time without shortage....

متن کامل

A Parallel Genetic Algorithm Based Method for Feature Subset Selection in Intrusion Detection Systems

Intrusion detection systems are designed to provide security in computer networks, so that if the attacker crosses other security devices, they can detect and prevent the attack process. One of the most essential challenges in designing these systems is the so called curse of dimensionality. Therefore, in order to obtain satisfactory performance in these systems we have to take advantage of app...

متن کامل

A comparison of algorithms for minimizing the sum of earliness and tardiness in hybrid flow-shop scheduling problem with unrelated parallel machines and sequence-dependent setup times

In this paper, the flow-shop scheduling problem with unrelated parallel machines at each stage as well as sequence-dependent setup times under minimization of the sum of earliness and tardiness are studied. The processing times, setup times and due-dates are known in advance. To solve the problem, we introduce a hybrid memetic algorithm as well as a particle swarm optimization algorithm combine...

متن کامل

A Parallel Genetic Algorithm Based Method for Feature Subset Selection in Intrusion Detection Systems

Intrusion detection systems are designed to provide security in computer networks, so that if the attacker crosses other security devices, they can detect and prevent the attack process. One of the most essential challenges in designing these systems is the so called curse of dimensionality. Therefore, in order to obtain satisfactory performance in these systems we have to take advantage of app...

متن کامل

An employee transporting problem

An employee transporting problem is described and a set partitioning model is developed. An investigation of the model leads to a knapsack problem as a surrogate problem. Finding a partition corresponding to the knapsack problem provides a solution to the problem. An exact algorithm is proposed to obtain a partition (subset-vehicle combination) corresponding to the knapsack solution. It require...

متن کامل

Parallel Jobs Scheduling with a Specific Due Date: Asemi-definite Relaxation-based Algorithm

This paper considers a different version of the parallel machines scheduling problem in which the parallel jobs simultaneously requirea pre-specifiedjob-dependent number of machines when being processed.This relaxation departs from one of the classic scheduling assumptions. While the analytical conditions can be easily statedfor some simple models, a graph model approach is required when confli...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Concurrency and Computation: Practice and Experience

دوره 24  شماره 

صفحات  -

تاریخ انتشار 2012